-#176  111 
UNCLASSIFIED 


ANAL VS IS  OF  A  SIMPLE  DEBUGGING  MODEL (U)  WASHINGTON  UNIV  1/ 
SEATTLE  DEPT  OF  STATISTICS  A  E  RAFTERV  10  NOV  86  TR-88 
N88814-84-C-8169 

F/D  9/1  NL 


CM 

CM 

c\j 


ANALYSIS  OF  A  SIMPLE  DEBUGGING  MODEL 


CO 

y—  m 

< 

| 

G 

< 


by 


Adrian  E.  Raftery 


TECHNICAL  REPORT  No.  80 
March  1986 


Department  of  Statistics,  GN-22 
University  of  Washington 
Seattle,  Washington  98195  USA 


Analysis  of  a  Simple  Debugging  Model 

Adrian  E.  Raftery 

Department  of  Statistics, 

Trinity  College, 

Dublin  2, 

Ireland. 

ABSTRACT 

A  system  has  an  unknown  number  of  faults.  Each  fault  causes  a  failure  of 
the  system,  and  is  then  located  and  removed.  The  failure  times  are  independent 
exponential  random  variables  with  common  mean.  A  Bayesian  analysis  of  this 
model  is  presented,  with  emphasis  on  the  situation  where  vague  prior  knowledge 
is  represented  by  limiting,  improper,  prior  forms.  This  provides  a  test  for 
reliability  growth,  estimates  of  the  number  of  faults,  an  evaluation  of  current 
system  reliability,  and  a  prediction  of  the  time  to  full  debugging.  Three 

examples  are  given. 
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1.  INTRODUCTION 


Consider  a  system  with  an  unknown  number  of  faults  N .  Each  fault  causes  a  failure  of  the 
system,  and  is  then  located  and  removed.  The  times  at  which  the  N  failures  occur  are  assumed  to 
be  independent  exponential  random  variables  with  common  mean  fT1.  Early  analyses  of  this 
model  were  carried  out  by  Bazovsky  (1961,  chap.8)  and  Cozzolino  (1968).  It  has  been  much 
studied  in  the  software  reliability  literature,  where  it  is  often  attributed  to  Jelinski  and  Moranda 
(1972). 

Problems  of  interest  include  finding  the  probability  that  all  the  faults  have  been  removed, 
estimating  the  number  of  remaining  faults,  evaluating  the  current  reliability  of  the  system,  and 
predicting  the  time  to  full  debugging.  Another  question  is  whether  the  system’s  failure  rate  is 
decreasing,  as  the  model  predicts.  Littlewood  and  Verrall  (1981)  and  Ascher  and  Fein  gold 
(1984,  pp.  110- 111)  emphasised  the  need  to  test  this  assumption,  and  described  software 
reliability  data  sets  in  which  the  failure  rate  increased  over  long  periods  of  time. 

My  aim  here  is  to  develop  methods  which  can  provide  solutions  to  such  problems,  as  well 
as  a  framework  for  making  decisions,  such  as  when  to  stop  debugging.  My  approach  is 
Bayesian,  with  an  emphasis  on  the  situation  where  vague  prior  knowledge  about  the  model 
parameters  is  represented  by  limiting,  improper,  prior  forms. 

Much  previous  research  has  focussed  on  point  estimation  of  N  (Blumenthal  and  Marcus 
1975;  Joe  and  Reid  1985;  Watson  and  Blumenthal  1980).  This  is  a  difficult  problem;  for 
example,  the  maximum  likelihood  estimator  (MLE)  of  N  can  be  infinite  with  substantial 
probability.  Indeed,  Goudie  and  Goldie  (1981),  who  studied  the  case  where  the  observed  number 
of  failures  is  specified  in  advance,  concluded  that  all  standard  non-Bayesian  techniques  are  liable 


to  fail.  My  approach  does  yield  estimators  of  N;  these  are  described  ^:d  compared  with  other 
estimators  in  Section  3. 

Forman  and  Singpurwalla  (1977)  proposed  a  stopping  rule  for  debugging  the  system  based 
on  how  close  the  observed  likelihood  is  to  a  large-sample  approximation;  their  aim  was  to 
ascertain  whether  the  system  had  been  fully  debugged.  Their  data  are  reanalyzed  in  Section  5. 1 
hope  that  this  paper  provides  a  more  precise  answer  to  that  question,  as  well  as  the  basis  for  a 
more  general  stopping  rule,  which  explicitly  takes  into  account  the  costs  associated  with  the 
various  possible  outcomes. 


2.  TESTING  FOR  RELIABILITY  GROWTH 

I  assume  that  the  system  has  been  observed  for  the  period  [0,7'],  during  which  n  failures 
have  occurred  at  times  t  =  (t  j, . . .  ,tn),  where  «>1.  I  consider  the  problem  of  comparing  the 
model  described  in  Section  1  with  the  constant  rate  Poisson  process  M0:  A.(,y)=|i,  where  X(s)  is 
the  rate  of  occurrence  of  failures  at  time  s . 

I  assume  that  the  sample  space  consists  of  systems,  rather  than  of  replications  of  the 
debugging  process  for  the  same  system.  N  is  thus  a  random  variable,  and  I  assume  that  it  has  a 
Poisson  distribution.  It  then  follows  that  the  model  is  equivalent  to  a  non-homogeneous  Poisson 
process  with  rate  function 

M ! :  X(s )  =  pexp(-[5,y )  (2.1) 


where  p>0  (Scholz  1986).  Non-Bayesian  statistical  analysis  of  this  process  has  been  considered 
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by  Cox  and  Lewis  (1966),  Lewis  (1972),  MacLean  (1974),  and  Berman  (1981). 

The  comparison  of  M0  with  M  j  is  based  on  the  Bayes  factor,  or  ratio  of  posterior  to  prior 
odds  for  Af  o  against  M  j, 

B0l=p{t\M0)/p(t\Ml)  (2.2) 

the  ratio  of  the  marginal  likelihoods.  In  (2.2) 

oo 

pit  |A/0)  =  jp(t \\lMo)p(.\l\M0)d\l 
o 

oo  oo 

p(t\M1)  =  jjp(t\p,^Mi)p(p^\Ml)dpd^ 

00 

If  the  priors  p  (p  |  M0)  and  p  (p,(3 1 M  t)  are  proper,  (2.2)  can  be  evaluated  directly. 

I  now  develop  an  expression  for  B0l  in  the  situation  where  vague  prior  knowledge  is 
represented  by  limiting,  improper,  prior  forms.  I  use  the  standard  vague  prior  for  |i 

p  (|A|A/0)  =  coM-_l  (2.3) 

(Jaynes  1968).  The  likelihood  for  M  j  is 

pit  \q#M\)  =  pnexpf-(3S -pP-1(l-exp(-(3r)XJ 

n 

where  5  =  £r,  .  This  is  an  exponential  family  likelihood,  for  which  a  natural  family  of  conjugate 
i=l 

prior  densities  is 

p(p,p|Af  j)  apt,expC-fc2M3pf3_1(l-«xp(-(3r));  (2.4) 

Akman  and  Raftery  (1986b)  have  shown  that  the  unique  prior  of  the  form  (2.4)  for  which  B  01  is 
invariant  to  scale  changes  in  the  time  variable  and  independent  of  the  stopping  time  T  is 


p(p,p|A/1)  =  clp'2 


(2.5) 


However,  the  Bayes  factor  calculated  using  the  improper  priors  (2.3>  and  (2.5)  involves  an 
arbitrary,  undefined,  multiplicative  constant  cq/c{.  Akman  and  Raftery  (1986b)  have  shown 
how  this  may  be  assigned  using  the  minimal  imaginary  experiment  idea  of  Spiegelhalter  and 
Smith  (1982).  This  consists  of  imagining  that  an  experiment  is  performed  which  yields  the 
smallest  possible  data  set  permitting  a  comparison  of  M0  and  Afj,  and  provides  maximum 
possible  support  for  M  0.  It  is  then  argued  that  the  resulting  Bayes  factor  should  be  only  slightly 
greater  than  one.  Raftery  and  Akman  (1986)  have  applied  this  approach  to  the  change-point 
Poisson  process;  their  results  may  be  compared  with  the  non-Bayesian  solution  of  Akman  and 
Raftery  (1986a). 

For  the  present  problem,  this  procedure  yields  c0/c  t  =  0.6449,  and 

B  qi=  0.6449  (n-1)  [|  exp(-fly )  (y /(l-exp(-y  ))}n~l  dy  J'1  (2.6) 

0 

where  R  =S/T.  Strictly  speaking,  any  value  of  Bqi  greater  than  one  indicates  that  the  data 
provide  evidence  against  reliability  growth.  However,  as  a  rough  order  of  magnitude 
interpretation,  Jeffreys  (1961,  Appendix  B)  has  suggested  that  the  evidence  should  be  regarded 
as  strong  only  if  BgplO1,  and  as  decisive  only  if  Bq^IO2. 
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3.  ESTIMATING  THE  NUMBER  OF  FAULTS  IN  THE  SYSTEM 

The  framework  developed  in  Section  2  is  used.  The  results  in  this  section  and  the  next  one 
are  conditional  on  M  If  the  priors  are  proper,  standard  Bayesian  inference  is  straightforward 
(Akxnan  1985;  Jewell  1985;  Langberg  and  Singpurwalla  1985;  Meinhold  and  Singpurwalla 
1983). 

It  follows  from  (2.5)  that 

oo 

p(N,p)  =  |p(AMp,P)p(p,P)dp 

ccfNfN-DJ^p-1  (3.1) 

Also, 

p(t  IN.P)  =  P" expf-pr (/?  +N-n )} N \/(N-n )!  (3.2) 

Combining  (3.1)  with  (3.2),  and  integrating  over  P  yields  the  posterior  distribution  of  the 
number  of  remaining  faults  M-N-n, 

n- 2 

p(M  \t)  a(M+R)~n  (3.3) 

<=i 

The  probability  that  the  system  has  been  fully  debugged  is  simply  P  [M=0 1  r  ].  Interval  estimates 
of  N ,  such  as  highest  posterior  density  (HPD)  regions,  or  one-sided  intervals,  may  readily  be 
found  from  (3.3). 

In  many  applications,  estimation  of  N  is  an  intermediate  step  in  the  solution  of  other 
problems.  However,  if  a  point  estimator  of  N  is  required,  it  may  be  obtained  from  (3.3)  by 


& 


-  7- 


Kv 


;c<q 


combining  it  with  an  appropriate  loss  function.  The  posterior  mode,  N mod,  is  the  estimator  which 
corresponds  to  a  zero-one  loss  function,  so  that,  if  the  appropriate  loss  function  is  bounded, 
may  well  be  a  good  approximation.  The  posterior  median,  (found  by  linear  interpolation), 
is  an  estimator  which  corresponds  to  one  unbounded  loss  function,  and  is  also  a  useful  summary 
of  the  posterior  distribution. 

Other  point  estimators  of  N  which  are  always  finite  include  Blumenthal  and  Marcus 
(1975)’s  modified  maximum  likelihood  estimator  N * ,  and  Joe  and  Reid  (1985)’s  harmonic  mean 
estimator  N.  Watson  and  Blumenthal  (1980)  considered  three  other  estimators,  but  their 
performance  in  a  simulation  study  was  very  similar  to  that  of  N  * ,  so  I  do  not  consider  them 
further  here. 

The  four  estimators,  and  N,  were  compared  in  a  small  simulation  study 

whose  results  are  summarised  in  Table  1.  (3  was  fixed  at  1.0,  and  T  was  set  equal  to  -log(l-Q ), 
where  N  and  Q  were  fixed  at  the  values  shown.  Q  is  thus  the  probability  of  a  randomly  chosen 
bug  causing  the  system  to  fail  before  time  T.  The  results  are  conditional  on  n  >1. 


The  most  striking  feature  of  Table  1  is  how  badly  all  four  estimators  performed;  none  did 
much  better  than  an  estimator  which  is  identically  equal  to  n.  Also,  no  one  estimator  was 
uniformly  better  than  any  other.  For  Q=0.9,  corresponding  to  the  situation  where  the  system  is 
close  to  being  fully  debugged,  N ^  performed  best,  while  for  Q= 0.25,  N  performed  best. 
These  results  suggest  that  it  would  be  better  to  report  the  full  posterior  distribution  (3.3),  or  some 
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of  its  salient  characteristics,  than  any  one  point  estimator. 


4.ESTIMATING  SYSTEM  RELIABILITY  AND  TIME  TO  FINAL  DEBUGGING 

The  reliability  of  the  system  is  the  probability  that  it  operates  without  failure  for  a  further, 
specified,  period  of  length  x ,  say.  This  is  equal  to  P  [X  >x  \  t  ],  where  X  =rn+1-T  is  the  time  to  the 
next  failure.  Now,X=<*»  if  M= 0,  and  P[X >x  |A/,(3]  =  exp(-Mjix)  (A/>1),  so  that 

oo 

P[X>*|fl  =P[M  =0|/]+£  Jexp(-Mptx)p(M,{}|r)dp 

M= 1  0 

=  P[M=0\t]+  £  p{M  \t){l+M(M+R)-\x/T)}-n 
M= 1 

E  [X  1 1  ]  is  always  infinite,  but  we  can  calculate 

E[X  |r,Af£l]  =  T(n -l)"1  {l+R(\-P [M=0 |r])~l  £  M~lp (M  \ t )} 

M=  1 

The  time  to  final  debugging  of  the  system  is  Z=tN-T.  Z=0  if  M- 0,  while  if  M^l,  Z  is  the 
maximum  of  M  independent  exponential  random  variables  with  mean  JT1.  Thus 

P[Z£r|r]=P[M=0|r]  +  £  p(M \t)  £  (-1)*  Pjf]  {l+kz/T{M+R)}~n 

M= 1  k=o  J 

and 

E[Z\t]  =  £  p  (M  |  f ){M  +R )  £(-l)*+1  pjfV1 
M= 1  *= 1  L  J 
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5.  EXAMPLES 


I  now  apply  the  techniques  proposed  here,  as  well  as  those  of  Blumenthal  and  Marcus 
(1975)  and  Joe  and  Reid  (1985),  to  several,  previously  analyzed,  software  reliability  data  sets. 
The  results  are  given  in  Table  2. 


Table  2  about  here 


Example  1.  Goel  and  Okumoto  (1979)  gave  the  failure  times  of  a  piece  of  software  developed  as 
part  of  the  Naval  Tactical  Data  System.  These  data  had  previously  been  analyzed  by  Jelinski  and 
Moranda  (1972).  By  the  end  of  the  production  and  testing  phases,  which  lasted  540  days,  31 
failures  had  occurred. 

The  Bayes  factor  B01,  at  about  10-3,  indicated  decisive  evidence  for  reliability  growth,  but 
[A/  =0 1  r  ]  was  only  0.27,  indicating  that  the  system  had  probably  not  been  fully  debugged. 
Indeed,  three  further  failures  later  occurred. 

The  techniques  proposed  here  gave  similar  results  to  the  likelihood  analysis  of  Joe  and  Reid 
(1985).  Mmai  and  M  were  very  close.  The  0.5  likelihood  interval,  proposed  as  an  interval 
estimator  by  Joe  and  Reid  (1985),  had  coverage  probability  close  to  0.76,  and  was  the  same  as 
the  76%  HPD  region  based  on  (3.3). 

Example  2.  Meinhold  and  Singpurwalla  (1983)  gave  the  failure  times  of  a  real-time  command 
and  control  system.  These  data  have  also  been  analyzed  by  Musa  (1975),  Goel  (1985),  and 
Okumoto  (1985).  After  n=7  failures,  the  MLE  of  N  was  infinite,  and  an  analysis  at  this  point 


I 


revealed  differences  between  the  present  approach  and  a  likelihood  analysis.  For  example,  the 
0.5  likelihood  interval  was  2  -<*>  and  had  coverage  probability  less  than  0.6,  but  posterior 
probability  0.88  from  (3.3). 

Meinhold  and  Singpurwalla  (1983)  suggested  a  Bayesian  analysis  with  a  proper  prior  for  N 
which  was  Poisson  with  mean  50.  This  yielded  a  posterior  distribution  for  M  concentrated 
between  21  and  57;  by  comparison  (3.3)  yielded  the  95%  HPD  region  0-155.  In  fact,  129  further 
failures  occurred. 

After  n- 136  failures,  the  present  approach  and  the  likelihood  analysis  of  Joe  and  Reid 
(1985)  gave  results  which  were  in  close  agreement,  as  in  Example  1. 

Example  3.  Forman  and  Singpurwalla  (1977)  analyzed  a  data  set  consisting  of  107  failures.  The 
data  were  grouped,  and,  like  them,  I  have  assumed  that  the  average  time  of  occurrence  within 
each  group  was  at  the  center  of  the  time  interval. 

After  n=  8  failures,  the  procedure  of  Joe  and  Reid  (1985)  produced  an  interval  estimate  for 
M  which  included  all  its  possible  values,  but  whose  coverage  probability  was  less  than  0.64. 
After  n=99  failures,  the  probability  of  eight  or  more  faults  remaining  was  less  than  10-4.  Thus, 
the  fact  that  eight  more  failures  did  occur  casts  doubt  on  the  appropriateness  of  the  model  for 


this  data  set. 
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